Allow reusing computation and data between CVC objects #644

giacomofiorin · 2023-12-19T19:51:23Z

This draft PR contains an implementation of the code to reuse the output of certain computations between colvar component (CVC) objects. The scheme aims to provide support a broader set of use cases than the ones described in #232, i.e. it aims to share specific pieces of computation between CVCs of different types, as well as allowing concurrent application of forces.

The proposed scheme works as follows. In the configuration string of CVC_B, a reference is provided to CVC_A, which is an object been defined previously in the Colvars config. The type of CVC_A may be the same as CVC_B (in which case its value is simply copied), or it may be a base class of CVC_B, or the same as one of its members. In the latter two cases, It's assumed that the computation of CVC_B can be written as follows:

Read atomic coordinates
Compute a quantity using CVC_A's code
Store that quantity in a member or temporary variable
Compute another quantity using CVC_B's code, but using the previous quantity as input

When CVC_A is an object that was already computed, CVC_B can skip the first two steps altogether, and just replace the third step with an assignment.

Potential use cases are the following. This PR contains so far only an implementation for the first case.

Reusing an orientation component in a tilt, spinAngle, euler{Phi,Theta,Psi}, component.
Reusing a distanceVec component as input to distance, distanceZ, distanceXY, etc.
Reusing a generic "sum of squares" as input for gyration, inertia, inertiaZ, etc
[ not in this PR ] Reusing the computation of the individual "nodes" in a pair of path CVs ("s" and "z").
[ not in this PR ] Reusing the optimal rotation in a RMSD, and the latter in an eigenvector variable (Note: this will take some work, as those two classes have substantial feature creep).

In all of the above, the speedup would be a factor of 2 or 3, assuming that the explicit loop over atoms is speed-limiting and depending on the configuration. (The inertia tensor has six independent components, but I assume that very few would be interested in biasing them all!)

Opening as a draft because the SMP loop is currently unaware of the mutual dependencies, leading to race conditions. I would also like to cover some of the simpler use cases listed above. (More complex ones, e.g. RMSD and path CVs, should be in their own PRs.)

Implements #232

…onents

HanatoK · 2024-06-25T15:09:54Z

src/colvarcomp.cpp

+  auto *base_ptr = cvm::main()->get_component_by_name(cvc_name);
+  cvc *cvc_ptr = dynamic_cast<cvc *>(base_ptr);
+  if (cvc_ptr) {
+    precomputed_cvcs[id] = std::shared_ptr<cvc>(dynamic_cast<cvc *>(cvc_ptr));


I am changing the returning type of get_component_by name to shared_ptr. I am also not sure why dynamic_cast is called twice.

Because it's a draft? :-)

It's a very dynamic cast

HanatoK · 2024-06-25T15:24:06Z

Hi @giacomofiorin! It seems there is no change to the SMP related code, but maybe I miss somethign. If a cvc reuses other ones, how does the SMP work?

giacomofiorin · 2024-06-25T15:31:48Z

Hi @giacomofiorin! It seems there is no change to the SMP related code, but maybe I miss somethign. If a cvc reuses other ones, how does the SMP work?

See the last paragraph description. What I would like to do is keep a list of reusable CVCs, and run those first over SMP.

HanatoK · 2024-06-25T15:35:32Z

Hi @giacomofiorin! It seems there is no change to the SMP related code, but maybe I miss somethign. If a cvc reuses other ones, how does the SMP work?

See the last paragraph description. What I would like to do is keep a list of reusable CVCs, and run those first over SMP.

Thanks! I am working on linearCombination and customColvar at first, and they are supposed to reuse all other CVCs. In my opinion, all CVCs should be reusable. Maybe we can add a new user option like reuseThisCV for all CVCs to let the users control whether a CVC is reused.

giacomofiorin · 2024-06-25T15:46:00Z

Thanks! I am working on linearCombination and customColvar at first, and they are supposed to reuse all other CVCs. In my opinion, all CVCs should be reusable. Maybe we can add a new user option like reuseThisCV for all CVCs to let the users control whether a CVC is reused.

That's one option, but then all CVC classes would then need to have the necessary API changes. Okay with you if we discuss more next week?

HanatoK · 2024-06-25T15:47:57Z

Thanks! I am working on linearCombination and customColvar at first, and they are supposed to reuse all other CVCs. In my opinion, all CVCs should be reusable. Maybe we can add a new user option like reuseThisCV for all CVCs to let the users control whether a CVC is reused.

That's one option, but then all CVC classes would then need to have the necessary API changes. Okay with you if we discuss more next week?

OK.

giacomofiorin · 2024-06-25T15:48:53Z

Also, linearCombination and customColvar are cheap enough that we can always run them at the end, outside of SMP (the same as the polynomial superposition code in colvar::collect_cvc_values()).

So it would make sense to make them special cases.

HanatoK · 2024-06-25T18:56:56Z

I just doubt if it is really necessary to distribute the CVCs to different threads. On one hand, if the thread for CVC_A takes longer time to finish, then the thread for CVC_B has to wait. On the other hand, it makes the reusable CVCs complicated. I still think it would be better if Colvars can give up distributing CVCs among threads completely, and use the threads in fine-grained cases, such as calculating correlation and overlapping matrices in optimal rotation, rotating atoms, and projecting a hill in metadynamics...

jhenin · 2024-06-26T12:41:29Z

The main use case for distributing CVCs over threads is that of multiple, expensive CVCs of similar computational cost. It is hard to know how widespread it is, but in that case, the current implementation has the benefit of improving performance transparently without user intervention.

HanatoK · 2024-06-26T14:21:25Z

The main use case for distributing CVCs over threads is that of multiple, expensive CVCs of similar computational cost. It is hard to know how widespread it is, but in that case, the current implementation has the benefit of improving performance transparently without user intervention.

If the expensive CVCs refer to RMSD, anything related to optimal alignment and coordNum, then it is still better to distribute the expensive loops in a single CVC over threads instead of distribute CVCs themselves over threads. The former approach is more friendly to CPU caches, especially the shared L3 cache, because the CPU could fetch a large chunk of continuous data, instead of many small blocks, from the main memory to L3 directly for all threads. Moreover, distributing the inner loops in a CVC over threads can guarantee to accelerate the calculation of the most expensive CVC. Distributing CVCs over threads means that the slowest CVC uses always a single thread, and if there are other fast CVs (for example, using RMSD, gyration and a few distances), then the other threads have to wait for the slowest done.

HanatoK · 2024-06-26T22:42:05Z

I am looking into ways to parallelize the computation of CVCs even if there are interdependencies. With the dependencies the CVCs are in a compute graph. A way to parallelize the computation could be:

Considering A is the parent of B if A reuses B;
Finding the CVCs that have no parents as "starting points" (similar to the pilot action in PLUMED if you are familiar with its code);
Traversing the compute graph and finding the maximum depth of all CVCs from the starting points;
Now each CVC should has different depth, and then we can group CVCs according to their max depths into vectors.
Step 4 gives us a vector of vectors, and then we can compute all CVCs by reverisbly iterating the vector, and parallelizing the CVCs that have the same max depth.

For example, in the following compute graph,

we can compute all "depth=3" cvcs at first, and the all "depth=2". Each depth group could be parallelized. The code can be found in https://github.com/HanatoK/miscellaneous_scripts/blob/8ed465bbe5f7ccf0d950b7d10449be95443c386d/walk_graph/test.cpp#L112-L143

Now the major issue is that Colvars actually does not distribute CVCs over threads, and instead, the parallelization scheme is worse than I expect. It distributes the outermost colvar over threads!

HanatoK · 2024-07-01T19:32:26Z

There should be an additional reusability improvement about applying biasing force only to a single value of a vector CV, so that we can safely compute some CVs in a single CVC and output a vector, and then bias only a scalar part of the vector.

giacomofiorin · 2024-07-02T16:45:26Z

Now the major issue is that Colvars actually does not distribute CVCs over threads, and instead, the parallelization scheme is worse than I expect. It distributes the outermost colvar over threads!

Hi @HanatoK the loop is done over colvar objects to avoid having the top-level class reach too deep into the CVCs. However, in the case of a colvar with multiple CVCs, that colvar will appear in the loop multiple times to achieve parallelism over the entire collection of CVC objects.

colvars/src/colvarmodule.h

Lines 273 to 277 in 0f2d682

    
           /// Collective variables to be calculated on different threads; 
        
           /// colvars with multple items (e.g. multiple active CVCs) are duplicated 
        
           std::vector<colvar *> colvars_smp; 
        
           /// Indexes of the items to calculate for each colvar 
        
           std::vector<int> colvars_smp_items;

Your other comments all make sense, especially regarding building a dependency tree and do multiple parallel loops, one for each level. This PR is meant to introduce only one level to start with.

giacomofiorin added the optimization label Dec 19, 2023

giacomofiorin mentioned this pull request Feb 20, 2024

Several improvements to colvar component classes #643

Merged

giacomofiorin force-pushed the reusable-cvcs branch 2 times, most recently from 14dbb58 to d1333a7 Compare February 21, 2024 23:44

giacomofiorin mentioned this pull request Mar 14, 2024

Protein-specific coordinates based on index groups #666

Merged

5 tasks

giacomofiorin force-pushed the reusable-cvcs branch from d1333a7 to 1b984a4 Compare March 19, 2024 19:24

giacomofiorin added 10 commits March 19, 2024 16:37

Add a global register of all colvar components

778d05e

Report list of objects after initialization

cb0aee0

Modernize memory management in colvarmodule object

1dd0297

Move tilt/spinAngle functions into quaternion class

19ebddf

Add dependency features for CVCs that reuse each other's computation

b5af15f

Reuse optimal rotation in all orientation-derived components

5814651

Do add padding space in traj file when colvars or biases report nothing

116b9e3

Add test for spinAngle CVC reusing a quaternion CVC

1f91f1a

Only allow reusing components that are not already reusing other comp…

1fce7f7

…onents

Fix order of deinitialization

5ec6f56

giacomofiorin force-pushed the reusable-cvcs branch from 1b984a4 to 5ec6f56 Compare March 19, 2024 20:37

HanatoK reviewed Jun 25, 2024

View reviewed changes

HanatoK mentioned this pull request Jul 1, 2024

[RFC] Work towards reusable CVCs #700

Draft

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow reusing computation and data between CVC objects #644

Allow reusing computation and data between CVC objects #644

giacomofiorin commented Dec 19, 2023 •

edited

Loading

HanatoK Jun 25, 2024

giacomofiorin Jun 25, 2024

jhenin Jun 26, 2024

HanatoK commented Jun 25, 2024

giacomofiorin commented Jun 25, 2024

HanatoK commented Jun 25, 2024

giacomofiorin commented Jun 25, 2024

HanatoK commented Jun 25, 2024

giacomofiorin commented Jun 25, 2024

HanatoK commented Jun 25, 2024

jhenin commented Jun 26, 2024

HanatoK commented Jun 26, 2024

HanatoK commented Jun 26, 2024

HanatoK commented Jul 1, 2024

giacomofiorin commented Jul 2, 2024

Allow reusing computation and data between CVC objects #644

Are you sure you want to change the base?

Allow reusing computation and data between CVC objects #644

Conversation

giacomofiorin commented Dec 19, 2023 • edited Loading

HanatoK Jun 25, 2024

Choose a reason for hiding this comment

giacomofiorin Jun 25, 2024

Choose a reason for hiding this comment

jhenin Jun 26, 2024

Choose a reason for hiding this comment

HanatoK commented Jun 25, 2024

giacomofiorin commented Jun 25, 2024

HanatoK commented Jun 25, 2024

giacomofiorin commented Jun 25, 2024

HanatoK commented Jun 25, 2024

giacomofiorin commented Jun 25, 2024

HanatoK commented Jun 25, 2024

jhenin commented Jun 26, 2024

HanatoK commented Jun 26, 2024

HanatoK commented Jun 26, 2024

HanatoK commented Jul 1, 2024

giacomofiorin commented Jul 2, 2024

giacomofiorin commented Dec 19, 2023 •

edited

Loading